Source coding by efficient selection of ground states clusters 
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In this letter, we show how the Survey Propagation algorithm can be generalized to include exter- 
nal forcing messages, and used to address selectively an exponential number of glassy ground states. 
These capabilities can be used to explore efficiently the space of solutions of random NP-complete 
constraint satisfaction problems, providing a direct experimental evidence of replica symmetry break- 
ing in large-size instances. Finally, a new lossy data compression protocol is introduced, exploiting 
as a computational resource the clustered nature of the space of addressable states. 
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The combinatorial problem of satisfying a large set 
of constraints that depend on N discrete variables is a 
fundamental one in computer science (optimization and 
coding theory) as well as in statistical physics (frustrated 
systems), and may become extraordinarily difficult even 
for randomly generated problems as soon as some con- 
trol parameters are selected in specific ranges. While 
in general the study of the connection (if any) between 
worst-case and typical-case computational complexity of 
hard (NP-complete IJ) CSPs has not yet been fully devel- 
oped, recent advances in the statistical mechanics study 
of random constraint satisfaction problems (CSPs) have 
connected the origin of such computational intractability 
to the onset of clustering in the space of optimal assign- 
ments and to the associated proliferation of metastable 
states 0, Q . An important byproduct of the analytical 
studies of random CSPs has been the introduction of a 
new class of algorithms — the so called Survey Propa- 
gation (SP) algorithms 0,0] — specially devised to deal 
with the clustering scenario and able to find optimal as- 
signments of benchmark problems on which all known 
optimization algorithms fail. 

In this letter we make a step forward in understanding 
the potentialities of Survey Propagation techniques by 
answering two questions: 

(i) Is it possible to explore efficiently the space of so- 
lutions of a given random combinatorial problem ? 

(ii) Can we use this capability of addressing a large set 
of states for computational purposes? 

The first question relates to the physical issue of prob- 
ing exactly the topology of the space of solutions (ground 
states) in problems that are in a clustered phase (the so 
called replica symmetry breaking (RSB) phase). Sur- 
prisingly enough, such geometrical insight is also impor- 
tant for engineering applications in information theory 
{e.g. LDPC error correcting codes 0). The second ques- 
tion addresses a new algorithmic perspective in which the 
presence of many states becomes a powerful resource of 
the computational device. 

In what follows we shall provide a positive answer to 
both questions by providing an efficient generalization 
of SP which is indeed capable of addressing efficiently 



- computational cost almost linear in the size of the 
problems - an exponential number of different clusters 
of ground states that are invisible to the known search 
algorithms. From the physical side, we provide the first 
numerical evidence for the RSB geometric structure for 
large size instances of NP-complete problems. On the 
computational side we show that one can indeed take 
advantage of the addressability of the set of clusters of 
ground states toproduce a "physical" lossy data com- 
pression scheme |3] with non-trivial performance. 

A generic constraint satisfaction problem is defined by 
N discrete variables — e.g. Boolean variables, finite sets 
of colors or Ising spins — which interact through con- 
straints involving typically a small number of variables. 
The energies Ca{£,) of the single constraints (equal to 
or 1, depending if they are satisfied or not by a given 
assignment ^) sum up to give the global energy function 
£ of the problem and are function of just a small subset 
of variables V{a) — {jai, ■ ■ ■ , jaif } (every variable j is in- 
volved on the other hand in a subset V{j) of constraints). 
Since one is interested in satisfying all the constraints si- 
multaneously, the CSP is just equivalent to the problem 
of looking for zero energy ground states. For most NP- 
complete CSP the function £ can be directly interpreted 
as a spin-glass-like hamiltonian 0,0, For instance, 
the well known case of the random K-SAT problem con- 
sists in deciding if M clauses — taking the form of the OR 
function of K variables chosen randomly among N possi- 
ble ones — can be simultaneously true. The energy con- 
tribution associated to a single clause can then be written 
as CaiO = 5(1 + Ja,iXai), whcrc = ±1 depend- 
ing on the truth value of ja, and Ja,i — ±1 if Ja, appears 
negated or directed in the clause a. The same variable 
can appear directed and negated in different terms and 
hence give rise potentially to frustration. 

The connectivity distribution and the loop structure of 
the factor graph associated to a CSP (in Fig.^ variables 
are represented as circles connected to the constraints in 
which they are involved, depicted as squares) has a strong 
influence on the behavior of search algorithms. For many 
important random CSPs, when the ratio a = ^ is in- 
cluded in a narrow region ad < a < ac — the exact values 
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FIG. 1: A portion of factor graph 



of the thresholds depending on the details of the problem 
and on the chosen random graph ensemble — the prob- 
lem is still satisfiablc but the zero energy phase of the 
associated hamiltonian breaks down in an exponential 
number of clustered components. 

The cavity method of statistical physics, used for ac- 
curate analytical computations |j, |9|, [lOj of the threshold 
locations in good agreement with the numerical experi- 
ment 11], provides as well the theoretical foundation of 
the Survey Propagation (SP) message passing algorithm, 
successful in the resolution of instances of both the q- 
coloring and the K-SAT problem jSj] which are hard 
for local search algorithms. A constraint node b is sup- 
posed to send a message ut^j = Ss (where Cs are vectors 
having just the s-th component equal to 1) to a variable j 
each time that j would violate the constraint b assuming 
the value s (a set of messages corresponds then to a clus- 
ter of configurations). For locally tree-like factor graphs 
(like the ones associated typically to randomly generated 
instances) the messages incoming to j S V{a) \ i can 
be assumed uncorrelated, after the temporary removal of 
a single clause a and of a variable i (the so called cav- 
ity step, see Fig. It becomes then possible to evalu- 
ate probability density functions for the messages, called 
cavity surveys (the probability space originates from the 
set of all clusters of satisfying assignments sampled with 
uniform measure): 

q 

Qbj(w'fcj) = 7?°/(ufcj , 0) + TjljS{ub,j,es) (1) 

Here, i]^ ^ are the probabilities that b constrains j not to 
enter the s state and rj'j^ j is the probability that no mes- 
sage is sent, b being already satisfied by the assignment 
of other variables. 

The cavity surveys form a closed system of KM func- 
tional equations for which a solution can be found in 
linear time by iteration ^ Q : 

Qa,i(Ma,t) = J T^Q Ssiiub^j}] x[UaA,{^b,j}] (2) 

where the function x {{ubj}) depends on the specific CSP 
and VQ = Y{j(,v(a)\iY\b<^v(j)\aQb,3{'^b,j)- The func- 
tional Sg [{ufc.j}] acts as a filter, assigning null weight 



to sets of messages associated to clusters of excited con- 
figurations. At the fixed point, from the knowledge of 
the surveys one may compute the fractions (W°) of 
clusters of solutions in which a variable j is frozen in the 
direction s (or is unfrozen). Such microscopic informa- 
tion can be successfully used to find optimal assignments 
by decimation 

We shall now present a generalization of the SP algo- 
rithm (SP-ext), allowing the retrieval of a solution close 
to any desired configuration ^. Hereafter we shall refer 
for simplicity to the i^T-SAT problem (s = ±1 only), but 
the method could be easily extended to a generic CSP. 
On general grounds, a way to analyze specific regions of 
the configuration space would be to solve the cavity equa- 
tions in presence of an additional field conjugated to some 
geometrical constraint {e.g. fixed magnetization). This 
strategy would be however algorithmically inefficient in 
that it would change the nature of the components of 
the messages (from integers to reals), slowing down sig- 
nificantly the iterative solution of the SP equations. 

Here we consider instead an arbitrary but quite nat- 
ural extension of the SP equations in which external 
messages Ui — e_^. (represented in Fig. ^ as trian- 
gles) in an arbitrary direction ^ G {—1, 1}^ are in- 
troduced for each variable. New associated surveys 
Qf{ui) = (1 — TT)S{ui, 0) + TrS{ui, e_^.) are given a priori 
and never updated, and affect dynamically the relative 
weight of the different clusters, entering into the measure 
VQ in the convolution integrals The parameter tt can 
be interpreted as strength of the perturbation. Conver- 
gence can be reached only if the zero energy constraint 
is respected. While an intensity tt ~ 1 would produce a 
complete polarization of the messages if ^ was a solution, 
in the general case the use of a smaller forcing intensity 
allows the system to react to the contradictory driving 
and to converge to a set of surveys sufficiently biased in 
the desired direction, allowing for an efficient selective 
exploration of specific parts of the solution space. 

In order to use SP-ext for probing the local geometry 
of the zero energy phase one proceeds as follows. First, 
a random solution a is found by decimation (SP-ext, dif- 
ferently from the standard SP 0, is typically able to 
retrieve complete solutions). Next, new satisfying assign- 
ments as are generated, forcing now the system along a 
direction obtained flipping NS spins of the original solu- 
tion a. In order to have a highly homogeneous distribu- 
tion of the clusters, we have chosen for our experiments 
an ensemble of random K-SAT in which variables have 
fixed degree and are balanced (i.e. have an equal number 
of directed and negated occurrences in the clauses) For 
this specific ensemble and for K ^ 5, one has ad = 14.8 
and Qfc = 19.53. Between ad and ac — 16.77 the phase 
is expected to have multiple levels of clustering (a full 
RSB phase 0, 0) whereas between aa and ac the 1- 
RSB phase is stable. We have estimated ac with a new 



3 



0.7 
0.6 
0.5 

^ 0.4 
b 

Q 0.3 
0.2 
0.1 




1-RSB 



f-RSB 



a=16.8 
a=15.2 



0.4 0.8 0.4 
dp- A dg+A 



0.8 ..■!!'» 



iiiiiiif , 



-GAP 



0.1 



0.2 0.3 0.4 0.5 0.6 

d 



0.5 
0.45 

0.4 
0.35 

0.3 
0.25 

0.2 
0.15 



K=5, a=16.8, no doping 
K=5, a=16.8, doping 
K=5, a=16.8, doping, 40% BP 
K=7, a=61 .4, doping, 40% BP 
Random Guessing 
Shannon bound 



0.05 



0.1 



0.15 
R 



0.2 



0.25 



0.3 



FIG. 2: Distribution of distances 



FIG. 3: Rate-Distortion profile for the compression of a ran- 
dom unbiased source 



message passing algorithm implementing the cavity equa- 
tions at the level of two steps of RSB |1J| . 

In the experiments we have taken instances of size 
TV = lO"* with an intensity tt = .35 of the forcing (close to 
the highest value of tt for which the SP-ext equations al- 
ways converged for this sample) . The Hamming distance 
between a and as is plotted against S in the first sta- 
bility diagram (black data points) shown in Fig. El For 
d < Sc — 0.3, (i//((T, Cis) linearly increases with a very 
small slope until a value d^,. Conversely, for S > Sc, it 
jumps to a value do — A, and a symmetric distribution 
of distances around do is obtained. Under the hypothesis 
of homogeneous distribution of clusters, the fixed point 
average site magnetization (W^ — W~) provides an ana- 

^ between 



lytic estimation of the typical overlap qo - 
two different clusters in agreement with the experiments. 
On the other hand, d^i is of the order of the average frac- 
tion of unfrozen variables The gap between clus- 
ters is the main prediction of the 1-RSB cavity theory 
which finds in these experiments a nice confirm. 

A completely different behavior is observed when re- 
peating the experiment in the expected full RSB phase. 
The histogram of the reciprocal distances among all the 
generated solutions (inset of Fig. |21 gray bars) is now 
gapless, unlike the previous sample (black bars), and 
the related stability plot in the main figure (white data- 
points) deviates significantly from the 1-RSB case. Vari- 
ous shapes of the stability plot, differing for their degree 
of convexity, can be obtained starting from different so- 
lutions of the same sample. This hints to the existence of 
a mixed phase, in which higher order hierarchical cluster- 
ing is present and in which many local cluster distribution 
topologies can coexist. 

The ability of SP-ext to select clusters is a new compu- 
tational feature which may play a role in different systems 
in which clustering of input data is important. Here we 
consider a first basic applications by implementing a lossy 
data compressor Q which exploits the 1-RSB clustered 



structure for data quantization purposes. 

Let us suppose to have an input iV-bit binary string 
^ generated from an unbiased and uncorrelated random 
source. Given an appropriate iiT-SAT instance with N 
variables, a solution as close as possible to ^ can be 
generated with SP-ext. One can expect to find a solu- 
tion at a distance close to do — A, if the cluster distribu- 
tion is homogeneous (balanced and fixed even connectiv- 
ity instances are then chosen). Furthermore, a is taken 
slightly larger than ac, in order to maximize the number 
of addressable clusters, still preserving a sharp separation 
among them. At this point, a compressed string is 
built by retaining just the spins of the first NR variables 
of a^. In the decompression stage, SP-ext is run over the 
same graph, applying a very intense forcing (tt = 0.99) 
parallel to . If i? > Rc, SP-ext becomes able to ex- 
actly select the single cluster to which (i^ belongs. The 
cluster addressing is actually so sharp, that no decima- 
tion is needed and all the remaining 7V(1 — R) unforced 
variables can simultaneously be fixed to their preferred 
orientation without creating contradictions. A compar- 
ison with the theoretical Shannon Bound 7} is done in 
Fig- 0] (dotted line), where the cluster selection transi- 
tion is clearly visible. The accumulated distortion with 
respect to ^ is of the order of d^i -I- do — A (the horizontal 
lines in Fig.|3refer to the original distance between ^ and 
(Tj, and, obviously, no better distortion can be achieved 
in this scheme). The line relative to the performance of 
a trivial decoder in which the missing bits are randomly 
guessed is also plotted. 

The Shannon Bound can be approached by the use 
of iterative doping technique for chosing the bits to 
store. After the determination of a^, SP-ext is run again 
without applying any forcing and a ranking of the most 
balanced variables is performed. One looks for the vari- 
able i which minimizes | — W~ \ + Wf (frozen in oppo- 
site directions in a similar number of clusters and rarely 
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imity of the Gardner line (empty circles in Fig.01 better 
critical rates but worse distortions are found for higher 
connectivities). Cluster selection is still possible and en- 
sures again a non-random correlation between the input 
and the output string. It is expected nevertheless that 
much better performances can be achieved by a careful 
optimization of the graph ensemble as it was shown for 
iterative decoding with Belief Propagation . 

We conclude by noticing that work is in progress in 
order to obtain a fully local version of the decimation 
procedure leading to the different solutions, by mean of 
a local reinforcement technique: this fact might be im- 
portant for the parallelization of the SP algorithm and for 
modeling distributed computation in complex networks. 
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unconstrained). The state assumed by i in the solution 
(T^ will be taken as the first bit of the compressed string 
and used to fix i. New doping steps are done, un- 
til when the desired compression rate has been reached. 
An identical doping stage is then performed in decom- 
pression. The iterative ranking allows indeed to find out 
which spins have to be fixed accordingly to the ordered 
bit sequence and the left variables can be fixed as in 
the previous decompression method. Fixing a balanced 
variable "switches off" a larger number of clusters, and, 
hence, the critical rate is reduced (dash dotted curve in 
Fig. , thanks to a less redundant coding of the infor- 
mation needed for the cluster selection. A further im- 
provement can be obtained, by using in the doping stage 
a modified iteration, that interpolates between SP and 
the well known Belief Propagation (IJj equations (solid 
curve in Fig. EJ. 

SP-ext can also be applied when the occurrence proba- 
bility P± of the possible input symbols arc different. The 
issue is of practical relevance since correlated sources can 
often be shown to be equivalent to memoryless biased 
sources Let suppose that the bias of the source is 

6 = P+ — P_ > 0. It is possible to engineer graphs with 
a cluster distribution concentrated around the ferromag- 
netic direction, by making for every variable i the fraction 
7+ of couplings Ja^i = +1 larger than the fraction 7_ of 
Ja,i = — 1. As shown indeed in Fig. 01 a narrow SAT RSB 
stripe is still present for a balancing B — 7+— 7- < 0.435. 
When the balancing is too large, there is on the other 
hand a direct transition between an unfrozen SAT phase 
and the UNSAT region (the line where optimized heuris- 
tics start to fail in determining a solution approximately 
continues the 1-RSB SAT/UNSAT transition line). The 
rate-distortion profile of the compression of a random un- 
correlated source with 6 = 0.2 and h = 0.3 is shown in 
the inset of the same figure. The best found graphs for 
all the analyzed values of h are always located in prox- 
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